06. TD Control: Q-Learning
TD Control: Q-Learning
Please watch the video below to learn about Q-Learning (or Sarsamax), a second method for TD control.
TD Control: Sarsamax
Check out this (optional) research paper to read the proof that Q-Learning (or Sarsamax) converges.
## Pseudocode
